
United States Patent and Trademark Office 



UNITED STATES DEPARTMENT OF COMMERCE 
United States Patent and Trademark Office 
Address: COMMISSIONER FOR PATENTS 
P.O. Box 1450 

Alexandria, Virginia 223 i 3- 1 450 
www.uspto.gov 



APPLICATION NO. 


FILING DATE 


FIRST NAMED INVENTOR 


ATTORNEY DOCKET NO. 


CONFIRMATION NO. 


09/820,401 


03/29/2001 


Michael Kahn 


M ATP -60 1 US 


5495 



7590 



05/05/2004 



23122 

ratnerprestia 

P O BOX 980 

VALLEY FORGE, PA 19482-0980 



EXAMINER 



WOZNIAK, JAMES S 



ART UNIT 



PAPER NUMBER 



2655 

DATE MAILED: 05/05/2004 



Please find below and/or attached an Office communication concerning this application or proceeding. 



PTO-90C (Rev. 10/03) 



. ■ . 


Application No. 










Applicant(s) 


\i 


Office Action Summary 


09/820,401 


KAHN, MICHAEL 




Examiner 

J all ICO o, vvuz.niaK 


Art Unit 





- The MAILING DATE of this communication appears on the cover sheet with the correspondence address — 
Period for Reply 



A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) FROM 
THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1 .1 36(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If the period for reply specified above is less than thirty (30) days, a reply within the statutory minimum of thirty (30) days will be considered timely. 

- If NO period for reply is specified above, the maximum statutory period wilt apply and will expire SIX (6) MONTHS from the mailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 133). 
Any reply received by the Office later than three months after the mailing date of this communication, even if timely filed, may reduce any 
earned patent term adjustment. See 37 CFR 1.704(b). 

Status 

1 )M Responsive to communication(s) filed on 3/10/04 . 
2a)IS This action is FINAL. 2b)D This action is non-final. 

3) D Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) M Claim(s) 1-20 is/are pending in the application. 

4a) Of the above claim(s) is/are withdrawn from consideration. 

5) D Claim(s) is/are allowed, 

Q)M Claim(s) 1-5.11-14 and 16-20 is/are rejected. 

7) S Claim(s) 6-10 and 15 is/are objected to. 

8) D Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) D The specification is objected to by the Examiner. 

10) S The drawing(s) filed on 29 March 2001 is/are: a)D accepted or b)D objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1.85(a). 
Replacement drawing sheet(s) including the correction is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 

1 1) D The oath or declaration is objected to by the Examiner. Note the attached Office Action or form PTO-152. 

Priority under 35 U.S.C. § 1 1 9 

12) D Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 119(a)-(d) or (f). 
a)D All b)D Some * c)D None of: 

1 .□ Certified copies of the priority documents have been received. 

2. D Certified copies of the priority documents have been received in Application No. . 

3. D Copies of the certified copies of the priority documents have been received in this National Stage 

application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 
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Detailed Action 



Response to Amendment 



1 . In response to the office action from 12/1 1/03, the applicant has submitted an 
amendment, filed 3/10/04, amending Claims 1, 7, 1 1, and 16 without adding new matter, while 
arguing to traverse the art rejection based on the amended limitation regarding a "spectral 
subtraction filtering method" (Amendment, Page 7). 

Applicant's arguments have been considered but are moot in view of the new grounds of 
rejection, necessitated by the amended claims, and based on Boll ("Suppression of Acoustic 
Noise in Speech Using Spectral Subtraction, " 1979). 

2. Based on the amendment to Claim 1, the objection directed towards minor informalities 
has been withdrawn. 

3. The draftsperson's objection listed on PTO 948 has been maintained as the informalities 
regarding character of lines, numbers, and letters have not been addressed. 



Claim Objections 



Claims 1, 7, 11, and 16 are objected to because of the following informalities: 



• With respect to Claim 1, "closed caption on an video display device" should be 



correct to read ~ closed caption on a video display device--. 
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• With respect to Claims 7, 11, and 16, "television program to as a closed caption 
on an video display device" should be corrected to read —television program as a 
closed caption on a video display device--. 
Appropriate correction is required. 



5. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

6. Claims 1, 2, 4, 16, 17, and 19 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Lange et al (U.S. Pub: 2001/0025241) in view of Boll ("Suppression of Acoustic Noise in 
Speech Using Spectral Subtraction, " 1979). 

With respect to Claims 1 and 16, Lange discloses: 

A method and computer readable carrier (carrier defined as a magnetic or optical disk as 
per paragraph 43 in the application specification) (conversion of an audio component ofanAV 
signal into captions implemented using a computer readable medium containing a program, 
Paragraph 16, Lines 6-10) including computer program instructions, respectively, for displaying 
text information corresponding to a speech portion of audio signals of a television program to as 
a closed caption on an video display device (AV captioning system including a display device, 
Fig, 1 Element 60, and a speech-to-text processor, Paragraph 16, Lines 3-6, also, the display 



Claim Rejections - 35 USC §103 
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device can be a television capable of processing closed captioning data, Paragraph 23, Lines 1- 
5), the method comprising the steps of: 

Decoding the audio signals of the television program (signal separation processing 
system, Fig. 1, Element 30, for separating the audio signal from an A V signal, Paragraph 19, 
Lines 1-7); 

Parsing the speech portion into discrete speech components in accordance with a speech 
model and grouping the parsed speech components (vocabulary used for recognizing speech as 
word units, Paragraph 31, Lines 5-7); 

Identifying words in a database corresponding to the grouped speech components 
(vocabulary database used for word recognition within a speech component of an audio signal, 
Paragraph 31, Lines 5-7); and 

Converting the identified words into text data for display on the display device as the 
closed caption (text box area for displaying text as it is transcribed, Paragraph 31, Lines 24-25). 

Lange does not teach the filtering of audio signals by using a spectral subtraction method, 
however, Boll discloses: 

Filtering the audio signals to extract the speech portion (filtering noise from a speech 
signal through a spectral subtraction noise suppression method, Abstract). 

Lange and Boll are analogous art because they are from a similar field of endeavor in 
speech processing for recognition. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to combine the use of spectral subtraction to suppress 
noise in a speech signal as taught by Boll with the method for converting the audio portion of an 
AV signal into captions using a speech recognition and speech-to-text conversion means as 
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taught by Lange to provide more accurate speech recognition due to increased speech 
intelligibility (Boll, Summary and Conclusions) by enhancing a speech portion of an audio signal 
through spectral subtraction. Therefore, it would have been obvious to combine Boll with Lange 
for the benefit of obtaining increased speech recognition accuracy in a captioning system through 
the use of a spectral subtraction method, to obtain the invention as specified in Claims 1 and 16. 
With respect to Claims 2 and 17, Lange additionally discloses: 
A method and computer readable carrier, according to claims 1 and 16, respectively, 
wherein the step of filtering the audio signals is performed concurrently with the inherently 
concurrent step of decoding of later-occurring audio signals of the television program and step of 
parsing of earlier occurring speech signals of the television program (captioning of an A V signal 
as it is received, Paragraph 27). 

With respect to Claims 4 and 19, Lange further recites: 

A method and computer readable carrier according to claims 1 and 16, respectively, 
further including the step of formatting the text data into lines of text data for display in a closed 
caption area of the display device (suggested option of caption formatting in a two or three-line 
roll-up format, Paragraph 23, Lines 17-25). 

Thus Lange anticipates the invention as recited in Claims 1, 2, 4, 16, 17, and 19. 

7. Claims 3 and 18 are rejected under 35 U.S.C. 103(a) as being unpatentable over Lange 
in view of Boll, and in further view of Ortega et al (U.S. Patent: 6,332,122). 
With respect to Claims 3 and 18: 



Application/Control Number: 09/820,401 Page 6 

Art Unit: 2655 

Lange in view of Boll teaches the method and computer program for converting the audio 
portion of an AV signal into captions that recognizes individual words within speech data by 
using a vocabulary database and utilizes spectral subtraction pre-processing, as applied to Claims 
1 and 16, however Lange does not teach: the use of a speaker independent model in individual 
word recognition as recited in Claims 3 and 18. 

Ortega discloses: 

A method and computer readable medium according to claims 1 and 16, respectively, 
wherein the step of parsing the speech portion into discrete speech components includes the step 
of employing a speaker independent model to provide individual words as the parsed speech 
components (speech frames identified through the use of a speaker independent model, Col. 4, 
Lines 35-48). 

Lange, Boll, and Ortega are analogous art because they are from a similar field of 
endeavor in speech processing for recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to combine the use of a speaker independent 
model in speech recognition as taught by Ortega with the method and computer program for 
converting the audio portion of an AV signal into captions that recognizes individual words 
within speech data by using a vocabulary database and utilizes spectral subtraction pre- 
processing as taught by Lange in view of Boll to create a captioning method in which the speaker 
can be easily identified by a viewer in the case where the system has not been trained for that 
particular speaker. Therefore, it would have been obvious to combine Ortega with Lange in 
view of Boll for the benefit of obtaining a captioning method, utilizing spectral subtraction pre- 
processing, capable of identifying a spoken utterance even if the speech recognition system has 
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not been trained with that speaker's voice, to obtain the invention as specified in Claims 3 and 
18. 



8. Claims 5, 11, 12, 14, and 20 are rejected under 35 U.S.C 103(a) as being unpatentable 
over Lange in view of Boll, and in further view of Ditzik (U.S. Patent: 6,415,256). 
With respect to Claims 5 and 20: 

Lange in view of Boll teaches the method and computer program for converting the audio 
portion of an AV signal into captions that recognizes individual words within speech data by 
using a vocabulary database and utilizes spectral subtraction pre-processing, as applied to Claims 
1 and 16, however Lange in view of Boll does not teach: a speaker dependent model for 
providing individual phonemes extracted from the speech signal as recited in Claim 5. 

Ditzik suggests: 

A method and computer readable carrier according to claims 1 and 16, respectively, 
wherein the step of parsing the speech portion into discrete speech components includes the step 
of employing a speaker dependent model to provide phonemes as the parsed speech components 
(speech recognition system designed for speaker dependent or speaker independent operation, 
Col. 3, Lines 17-19 and phoneme modeling database used for speech recognition that may 
contain special or generalized vocabularies, Col 3, Lines 52-59). 

Lange, Boll, and Ditzik are analogous art because they are from a similar field of 
endeavor in speech processing for recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to combine the use of a speaker dependent 
model in speech recognition to provide phonemes from speech data as taught by Ditzik with the 
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method and computer program for converting the audio portion of an AV signal into captions 
that recognizes individual words within speech data by using a vocabulary database and utilizes 
spectral subtraction pre-processing as taught by Lange in view of Boll to create a captioning 
method in which the speaker can be easily identified by a viewer in the case where the system 
has been trained for that particular speaker and in which a best word choice can be made on an 
individual phoneme basis. Therefore, it would have been obvious to combine Ditzik with Lange 
in view of Boll for the benefit of obtaining a captioning method, utilizing spectral subtraction 
pre-processing, capable of identifying a speaker that has been previously trained within a speech 
recognition system featuring a more accurate word identification method on a phoneme-by- 
phoneme basis, to obtain the invention as specified in Claims 5 and 20. 
With respect to Claim 1 1 : 

Lange in view of Boll teaches the method and computer program for converting the audio 
portion of an AV signal into captions that recognizes individual words within speech data by 
using a vocabulary database and utilizes spectral subtraction pre-processing, as applied to Claims 
1 and 16, however Lange does not teach: a phoneme identification as recited in Claim 1 1 . 

Ditzik discloses: 

A phoneme generator, which parses the speech portion into phonemes in accordance with 
a speech model (speech processing utilizing a phoneme modeling function, which divides speech 
into phonemes using phoneme modeling and HMM algorithms, Col 3, Lines 30-43); 

A database of words, each word being identified as corresponding to a discrete set of 
phonemes (phoneme modeling database that contains a grammar rules database for discerning 



» 



Application/Control Number: 09/820,401 Page 9 

Art Unit: 2655 

allowable word sequences, word dictionaries, and a phonological rules database for providing 
data to the phoneme modeling program, Col 3, Lines 52-67); and 

A word matcher which groups the phonemes provided by the phoneme generator and 
identifies words in the database corresponding to the grouped phonemes (word dictionary 
suggests word recognition from input phonemes, and inherently, phonemes would also be 
grouped to form a word since words can consists of multiple phonemes. Phoneme groupings in 
word form are then further grouped in a word sequence using a grammar dictionary, Col 3, 
Lines 55-59). 

Lange, Boll, and Ditzik are analogous art because they are from a similar field of 
endeavor in speech processing for recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to combine a means of separating input speech 
into phonemes for word recognition via a database as taught by Ditzik with the method and 
computer program for converting the audio portion of an AV signal into captions that recognizes 
individual words within speech data by using a vocabulary database and utilizes spectral 
subtraction pre-processing as taught by Lange in view of Boll to create a captioning method in 
which a best word choice can be made on an individual phoneme basis through phoneme 
grouping and utilizing a known word database for efficient processing. Therefore, it would have 
been obvious to combine Ditzik with Lange in view of Boll for the benefit of obtaining a 
captioning method, utilizing spectral subtraction pre-processing, and featuring a more efficient 
word identification method on a phoneme-by-phoneme basis by examining only the best word 
choices within a database, to obtain the invention as specified in Claim 11. 

With respect to Claim 12: 



# 
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Lange in view of Boll teaches captioning, utilizing spectral subtraction pre-processing, of 
an AV signal in real-time, which is indicative of parallel processing as applied to Claim 2, 
however Lange in view of Boll does not teach a phoneme generator as recited in Claim 12. 

Ditzik discloses the phoneme modeling function as applied to Claim 1 1 . 

Lange, Boll, and Ditzik are analogous art because they are from a similar field of 
endeavor in speech processing for recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to combine a means of separating input speech 
into phonemes for word recognition via a database as taught by Ditzik with the method of 
captioning, utilizing spectral subtraction pre-processing, of an AV signal in real-time as taught 
by Lange in view of Boll to create a captioning method in which a best word choice can be made 
on an individual phoneme basis through phoneme grouping and utilizing a known word database 
for efficient processing. Therefore, it would have been obvious to combine Ditzik with Lange in 
view of Boll for the benefit of obtaining a real-time captioning method, utilizing spectral 
subtraction pre-processing, and featuring a more efficient word identification method on a 
phoneme-by-phoneme basis by examining only the best word choices within a database, to 
obtain the invention as specified in Claim 12. 

With respect to Claim 14, 

Lange in view of Boll teaches the method, system, and computer program for converting 
the audio portion of an AV signal into captions that utilizes spectral subtraction pre-processing, 
recognizes individual words within speech data by using a vocabulary database, and features 
decoding, filtering, and processing means in producing a caption as applied to Claim 1 , however 



* 
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Lange in view of Boll does not teach a speaker dependent speech recognition system containing 
a phoneme generator as recited in Claim 14. 

Ditzik discloses speaker dependent speech recognition as applied to Claim 5 and a 
phoneme modeling function as applied to Claim 1 1 . 

Lange, Boll, and Ditzik are analogous art because they are from a similar field of 
endeavor in speech processing for recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to combine the use of a speaker dependent 
model in speaker dependent speech recognition to provide phonemes from speech data as taught 
by Ditzik with the method, system, and computer program for converting the audio portion of an 
AV signal into captions that utilizes spectral subtraction pre-processing, recognizes individual 
words within speech data by using a vocabulary database, and features decoding, filtering, and 
processing means in producing a caption as taught by Lange in view of Boll to create a 
captioning method in which the speaker can be easily identified by a viewer in the case where 
the system has been trained for that particular speaker and in which a best word choice can be 
made on an individual phoneme basis. Therefore, it would have been obvious to combine Ditzik 
with Lange in view of Boll for the benefit of obtaining a captioning method, utilizing spectral 
subtraction pre-processing, capable of identifying a speaker that has been previously trained 
within a speech recognition system featuring a more accurate word identification method on a 
phoneme-by-phoneme basis, to obtain the invention as specified in Claim 14. 

9. Claim 13 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lange in view of 
Boll, in further view of Ditzik, and in yet further view of Ortega 



Application/Control Number: 09/820,401 Page 12 

Art Unit: 2655 

With respect to Claim 13: 

Lang in view of Boll, and in further view of Ditzik teaches the captioning system and 
computer program utilizing a vocabulary for word identification from phoneme groupings and 
spectral subtraction pre-processing, as applied to Claim 1 1 , but do not teach the use of a speaker 
independent model as recited in Claim 13. 

Ortega recites the use of a speaker independent model in speech frame recognition as 
applied to Claim 3. 

Lange, Ditzik, and Ortega are analogous art because they are from a similar field of 
endeavor in transcribing text from speech. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to combine the use of a speaker independent 
model in speech recognition as taught by Ortega with the captioning system and computer 
program utilizing a vocabulary for word identification from phoneme groupings and spectral 
subtraction pre-processing as taught by Lange in view of Boll, and in further view of Ditzik to 
create a captioning method in which the speaker can be easily identified by a viewer in the case 
where the system has not been trained for that particular speaker and in which a best word choice 
can be made on an individual phoneme basis through phoneme grouping utilizing a known word 
database for efficient processing. Therefore, it would have been obvious to combine Ortega with 
Lange in view of Boll, and in further view of Ditzik for the benefit of obtaining a more flexible 
captioning method, utilizing spectral subtraction pre-processing, capable of identifying speech by 
a speaker even if the speech recognition system has not been trained with that speaker's voice, to 
obtain the invention as specified in Claim 13. 
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Allowable Subject Matter 



10. Claims 6 and 15 are objected to as being dependent upon a rejected base claim, but 
would be allowable if rewritten in independent form including all of the limitations of the base 
claim and any intervening claims. 

1 1 . Claims 7-10 would be allowable if rewritten or amended to overcome the claim objection 
of Claim 7, set forth in this Office action. 

12. The following is a statement of reasons for the indication of allowable subject matter: 
Prior art teaches the captioning system that converts the audio portion of an AV signal 

through individual phoneme recognition and word sequencing and features speaker 
dependent/independent recognition, decoding, filtering, and processing means as applied to 
Claims 1-5, 11-14, and 16-20. Additionally, "An Automatic Caption-superimposing System 
with a New Continuous Speech Recognizer" by Imai, Ando, and Miyasaka teaches a captioning 
system utilizing speech recognition training for error minimization performed before television 
signal transmission. 

Prior art does not teach: 

• A transmitted television signal containing training text used for updating a HMM 



to be used in phoneme identification as recited in Claims 6 and 15. 



• A transmitted television signal containing training text used for generating a 



HMM as recited in Claim 7. 



• Claims 8, 9, and 10 further limit their parent claims, and thus, are allowable. 
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Conclusion 

13. Applicant's amendment necessitated the new ground(s) of rejection presented in this 
Office action. Accordingly, THIS ACTION IS MADE FINAL See MPEP § 706.07(a). 
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1. 136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1 .136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the date of this 
final action. 

14. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to James S. Wozniak whose telephone number is (703) 305-8669 
and email is James.Wozniak@uspto.gov. The examiner can normally be reached on Mondays- 
Fridays, 8:30-4:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Talivaldis Ivars Smits can be reached at (703) 306-301 1. The fax/phone number for 
the Technology Center 2600 where this application is assigned is (703) 872-9306. 



• 
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Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the technology center receptionist whose telephone number is (703) 306- 
0377. 



James S. Wozniak 
4/29/04 




PRIMARY EXAMINER 



